ApplyMomentum

对权重张量执行 Momentum/改进动量优化更新。

\[\begin{split}\begin{aligned} accu_t &= moment \cdot accu_{t-1} + g_t \\ update_t &= \begin{cases} (accu_t \cdot moment + g_t), & \text{if nesterov = True} \\ accu_t, & \text{otherwise} \end{cases} \\ weight_t &= weight_{t-1} - learning\_rate \cdot update_t \end{aligned}\end{split}\]

输入：

weight - 待更新权重张量首地址。

accumulate - 动量累积张量首地址。

gradient - 梯度张量首地址。

learning_rate - 学习率。

moment - 动量系数。

nesterov - 是否启用 Nesterov 动量。

start - 参与计算的起始索引（闭区间）。

end - 参与计算的结束索引（开区间）。

core_mask(int, 可选) - 核掩码（仅适用于共享存储版本）。

输出：

weight - 原地写回更新后的权重张量。

accumulate - 原地写回更新后的动量张量。

支持平台：
FT78NE MT7004

备注

FT78NE 支持 fp32 数据类型。

MT7004 支持 fp16、fp32 数据类型。

共享存储版本:

void hp_applymomentum_s(half *weight, half *accumulate, const half *gradient, float learning_rate, float moment, bool nesterov, int start, int end, int core_mask)

void fp_applymomentum_s(float *weight, float *accumulate, const float *gradient, float learning_rate, float moment, bool nesterov, int start, int end, int core_mask)

C调用示例：

// FT78NE 多核示例
#include <stdio.h>
#include <stdbool.h>

int main(void) {
    float *weight = (float *)0xA0000000;      // DDR 存储
    float *accumulate = (float *)0xB0000000;
    float *gradient = (float *)0xC0000000;
    int start = 0;
    int end = 4096;
    int core_mask = 0xff;
    float learning_rate = 1e-2f;
    float moment = 0.99f;
    bool nesterov = false;
    fp_applymomentum_s(weight, accumulate, gradient,
                        learning_rate, moment, nesterov,
                        start, end, core_mask);
    return 0;
}

私有存储版本:

void hp_applymomentum_p(half *weight, half *accumulate, const half *gradient, float learning_rate, float moment, bool nesterov, int length)

void fp_applymomentum_p(float *weight, float *accumulate, const float *gradient, float learning_rate, float moment, bool nesterov, int length)

C调用示例：

// MT7004 单核示例
#include <stdio.h>
#include <stdbool.h>

int main(void) {
    half *weight = (half *)0x10000000;       // L2 存储
    half *accumulate = (half *)0x10002000;
    half *gradient = (half *)0x10004000;
    int length = 2048;
    float learning_rate = 5e-3f;
    float moment = 0.9f;
    bool nesterov = true;
    hp_applymomentum_p(weight, accumulate, gradient,
                       learning_rate, moment, nesterov,
                       length);
    return 0;
}